Piranha : Exploiting Single - Chip Multiprocessing

نویسندگان

  • Luiz André Barroso
  • Kourosh Gharachorloo
  • Tom Heynemann
  • Dan Joyce
  • David Lowell
  • Harland Maxwell
  • Joel McCormack
  • Ravishankar Mosur
  • Jeff Sprouse
  • Robert Stets
  • Scott Smith
چکیده

Computer parently reordering instructions from nearby program regions, but even sophisticated compiler scheduling is fundamentally limited by the compiler’s inability to perfectly determine the programmer’s intent and its commitment to preserve the program’s high-level structure and semantics. Given the amount of parallel work being done, we could conceivably build a superscalar processor with an instruction window large enough to simultaneously contain code from different program regions—specifically, different functions or loop iterations. However, over and above the many engineering obstacles, maintaining a large, contiguous window full of useful instructions poses a fundamental problem. Specifically, the decreasing accuracy of a series of branch predictions leads to an exponentially decreasing likelihood that instructions at the tail of the window will be useful. Overcoming this problem requires a model that lets parallelism from different program regions be exploited in a reasonably independent—that is, noncontiguous and nonserial—manner. The speculative multithreading model considers each program region to be a speculative thread or small program. By executing multiple speculative threads in parallel, high degrees of concurrency can be achieved in an aggregate fashion, especially if each thread is mostly sequential. The model subsequently merges the threads to recreate the original program. Speculative multithreading lets us fashion a large instruction window

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

As compared to a complex single processor based system, on-chip multiprocessors are less complex, more power efficient, and easier to test and validate. In this work, we focus on an on-chip multiprocessor where each processor has a local memory (or cache). We demonstrate that, in such an architecture, allowing each processor to do off-chip memory requests on behalf of other processors can impro...

متن کامل

Integrating Parallelizing Compilation Technology and Processor Architecture for Cost-Effective Concurrent multithreading

As the number of transistors on a single chip continues to grow, it is important to think beyond the traditional approaches of compiler optimizations for deeper pipelines and wider instruction issue units to improve performance. This single-threaded execution model limits these approaches to exploiting only the relatively small amount of instruction-level parallelism available in application pr...

متن کامل

C-slow Technique vs Multiprocessor in designing Low Area Customized Instruction set Processor for Embedded Applications

The demand for high performance embedded processors, for consumer electronics, is rapidly increasing for the past few years. Many of these embedded processors depend upon custom built Instruction Ser Architecture (ISA) such as game processor (GPU), multimedia processors, DSP processors etc. Primary requirement for consumer electronic industry is low cost with high performance and low power cons...

متن کامل

Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor

Thread-level speculation (TLS) makes it possible to parallelize general purpose C programs. This paper proposes software and hardware mechanisms that support speculative thread-level execution on a single-chip multiprocessor. A detailed analysis of programs using the TLS execution model shows a bound on the performance of a TLS machine that is promising. In particular, TLS makes it feasible to ...

متن کامل

Exploiting the Potential of a Network of IRAMs

Recently, a great deal of research has gone into reducing the gap in performance between processors and their memory systems. Techniques such as prefetching have been developed in order to hide the long latencies involved in retrieving data from oo-chip DRAM. However, applications with irregular access patterns generally see greatly reduced beneet from these techniques, and latencies are becomi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001